Search CORE

140 research outputs found

Data growth and its impact on the SCOP database: new developments

Author: A. Andreeva
A. G. Murzin
Altschul
Andreeva
Andreeva
Berman
C. Chothia
Chandonia
Chandonia
D. Howorth
Finn
J.-M. Chandonia
Lo Conte
Moroz
Murzin
S. E. Brenner
T. J. P. Hubbard
Wheeler
Yooseph
Publication venue: Oxford University Press
Publication date: 13/11/2007
Field of study

The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop

King's Research Portal

SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database

Author: B Seavey
FS Altschul
G Cornilescu
J Söding
J Söding
J-M Chandonia
JND Battey
Murray Coles
S Karlin
S Neal
S Neal
Simon W. Ginzinger
SW Ginzinger
SW Ginzinger
Publication venue: Springer Netherlands
Publication date: 01/01/2009
Field of study

We present SimShiftDB, a new program to extract conformational data from protein chemical shifts using structural alignments. The alignments are obtained in searches of a large database containing 13,000 structures and corresponding back-calculated chemical shifts. SimShiftDB makes use of chemical shift data to provide accurate results even in the case of low sequence similarity, and with even coverage of the conformational search space. We compare SimShiftDB to HHSearch, a state-of-the-art sequence-based search tool, and to TALOS, the current standard tool for the task. We show that for a significant fraction of the predicted similarities, SimShiftDB outperforms the other two methods. Particularly, the high coverage afforded by the larger database often allows predictions to be made for residues not involved in canonical secondary structure, where TALOS predictions are both less frequent and more error prone. Thus SimShiftDB can be seen as a complement to currently available methods

Crossref

Springer - Publisher Connector

PubMed Central

MPG.PuRe

Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks

Author: Altschul S. F., Madden, T. L., Sch
Baldi P., Brunak, S., Frasconi, P.
CHANDONIA J-M
Crooks G. E. &amp
Kinjo A. R. &amp
Kinjo A. R. &amp
Kinjo A. R., Horimoto, K. &amp
Lee B. &amp
Li W., Jaroszewski, L. &amp
Nishikawa K. &amp
Pollastri G., Baldi, P., Fariselli
Rost B.
TATENO Y
Publication venue: 'Biophysical Society of Japan'
Publication date: 01/01/2005
Field of study

Prediction of one-dimensional protein structures such as secondary structures and contact numbers is useful for the three-dimensional structure prediction and important for the understanding of sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average,

Q_3

accuracy of 77.8% for SS, correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to other state-of-the-art methods, and that of the CN prediction is a significant improvement over previous methods. We give a detailed formulation of critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from amino acid sequence beyond that provided by the position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for publication in BIOPHYSIC

arXiv.org e-Print Archive

Crossref

PROMALS3D web server for accurate multiple protein sequence and structure alignments

Author: Bateman
Chandonia
Do
Edgar
Henikoff
Holm
J. Pei
Jones
Katoh
Lipman
M. Tang
Murzin
N. V. Grishin
Notredame
O'Sullivan
Pei
Simossis
Thompson
Thompson
Thompson
Wu
Zhang
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/

Crossref

PubMed Central

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Author: A Andreeva
A Bateman
A Elofsson
A Lupas
A McPherson
A Sali
AE Todd
AE Todd
AJ Enright
B Rost
C Sander
C Vogel
CA Orengo
CH Wu
Christine A Orengo
D Baker
D Busso
D Vitkup
DT Jones
DT Jones
FMG Pearl
GA Reeves
I Letunic
IV Grigoriev
J Liu
J Liu
J Park
J Thornton
J Westbrook
JA Ranea
JC Norvell
JC Wootton
JD Watson
JM Chandonia
JM Chandonia
JM Chandonia
K Karplus
KT Simons
M Linial
M Skovgaard
N Siew
PJ Kersey
R Sanchez
RA Laskowski
RC Stevens
RC Stevens
RI Sadreyev
RL Marsden
Russell L Marsden
SA Lesley
SE Brenner
SE Brenner
SH Kim
SK Burley
SK Burley
SR Eddy
TC Terwilliger
Tony A Lewis
W Minor
W Tian
Y Kim
Y Yan
Publication venue: BioMed Central
Publication date: 01/03/2007
Field of study

BACKGROUND: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. RESULTS: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. CONCLUSION: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles

Author: Abecasis G. R.
Alessandro Pandini
Altschul S. F.
Altshuler D. M.
Amadei A.
Aranda B.
Arianna Fornili
Bahar I.
Bahar I.
Bahar I.
Berendsen H.
Bhardwaj N.
Bobay B. G.
Boehr D. D.
Bogan A. A.
Bordogna A.
Bouvier B.
Brookes A. J.
Camacho C.
Carbonell P.
Chandonia J.-M.
Cover T. M.
Cukuroglu E.
Cumming G.
Daily M. D.
Dasgupta B.
Davis F. P.
de Groot B. L.
de Groot B. L.
De Simone A.
del Sol A.
DeLano W.
Dobbins S. E.
Dong Q.
Doruker P.
Dosztányi Z.
Dunbrack R. L.
Dyson H. J.
Echave J.
Ekman D.
Erijman A.
Essmann U.
Eyrisch S.
Fernández A.
Ferrer-Costa C.
Fong J. H.
Fornili A.
Franca Fraternali
Fraternali F.
Goldenberg O.
Haliloglu T.
Haliloglu T.
Hamosh A.
Han J.-D. J.
Hess B.
Hess B.
Higurashi M.
Higurashi M.
Hub J. S.
Hui-Chun Lu
Humphris E. L.
Jeong H.
Jones S.
Jorgensen W.
Kabsch W.
Kar G.
Keskin O.
Keskin O.
Keskin O.
Keskin O.
Keskin O.
Kiel C.
Kim P. M.
Kim P. M.
Kim S.
Kleinjung J.
Kohn J. E.
Kortemme T.
Krissinel E.
Kuttner Y. Y.
Kuzu G.
Lange O. F.
Li X.
Liu L.
Lounnas V.
Maguid S.
Margreitter C.
Martin A. C. R.
Meireles L.
Meireles L. M. C.
Micheletti C.
Mittag T.
Münz M.
Nussinov R.
Pandini A.
Pandini A.
Pandini A.
Pandini A.
Pandini A.
Park B. H.
Patil A.
Patil A.
Peters J. H.
Petrov D.
Poirot O.
Qin H.
R-Development-Core-Team
Rajamani D.
Roulston M.
Rousseeuw P.
Schlitter J.
Schäfer H.
Seeliger D.
Sherry S. T.
Stein A.
Tsai C.-J.
Tuncbag N.
Tuncbag N.
Tyagi M.
van der Spoel D.
Van Gunsteren W.
Vogel C.
Vogel C.
Volkman B. F.
Wells J. A.
Winget J. M.
Wolfe R.
Yogurtcu O. N.
Zen A.
Zen A.
Zhang Q. C.
Zheng W.
Zhu X.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 18/10/2013
Field of study

The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein-protein interaction prediction and design methods. © 2013 American Chemical Society

Crossref

PubMed Central

King's Research Portal

Brunel University Research Archive

Extending CATH: increasing coverage of the protein structure universe and linking structure with function

Author: A. B. Clegg
A. L. Cuff
Ashburner
Bairoch
Buchan
C. A. Orengo
Chandonia
Cuff
D. Jones
Dessailly
Grabowski
Hendrickson
I. Sillitoe
J. Thornton
Kanehisa
M. Pellegrini-Calace
N. Furnham
Neumann
Orengo
Orengo
R. Rentzsch
Rahman
Redfern
Ruepp
T. Lewis
Taylor
Todd
Publication venue: Oxford University Press
Publication date: 19/11/2010
Field of study

CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework

Crossref

LSHTM Research Online

PubMed Central

FLORA: a novel method to predict protein function from structure in diverse superfamilies

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

A framework for protein structure classification and identification of novel protein structures

Author: AC Martin
AC Murzin
AJ Enright
AP Singh
C Cortes
CA Orengo
D Chivian
D Frishman
G Getz
HK Saini
IN Shindyalov
J Gough
J Hou
JE Gewehr
Jignesh M Patel
JM Chandonia
L Holm
L Holm
L Lo Conte
M Madera
N Beckmann
O Çamoglu
O Çamoglu
P Røgen
R Day
S Cheek
S Van Dongen
T Madej
You Jung Kim
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important. RESULTS: In this paper we present a unified framework for protein structure classification and identification of novel protein structures. The framework consists of a set of components for comparing, classifying, and clustering protein structures. These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds. In our evaluation with SCOP 1.69, our method correctly classifies 86.0%, 87.7%, and 90.5% of new domains at family, superfamily, and fold levels. Furthermore, for protein domains that belong to new domain families, our method is able to produce clusters that closely correspond to the new families in SCOP 1.69. As a result, our method can also be used to suggest new classification groups that contain novel folds. CONCLUSION: We have developed a method called proCC for automatically classifying and clustering domains. The method is effective in classifying new domains and suggesting new domain families, and it is also very efficient. A web site offering access to proCC is freely available a

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

BALL - biochemical algorithms library 1.3

Author: A Hildebrandt
A Moll
A Rurainski
A Savelyev
AB Chowdry
AK Dehof
Alexander Rurainski
Andreas Bertsch
Andreas Hildebrandt
Andreas Moll
Anna Katharina Dehof
B Chapman
B Kneissl
BR Brooks
C Steinbeck
Chemical Computing Group
CK Materese
Daniel Stöckel
E Segev
Hans-Peter Lenhof
J Chandonia
J Ponder
J Wegner
J Xu
J Xu
LLC Schrödinger
M Brylinski
M Phillips
M Röttig
Marcel Schumann
N Kalisman
N Maghsoudi
Nora C Toussaint
O Kohlbacher
Oliver Kohlbacher
Sabine C Mueller
Stefan Nickels
TA Halgren
W Kabsch
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The Biochemical Algorithms Library (BALL) is a comprehensive rapid application development framework for structural bioinformatics. It provides an extensive C++ class library of data structures and algorithms for molecular modeling and structural bioinformatics. Using BALL as a programming toolbox does not only allow to greatly reduce application development times but also helps in ensuring stability and correctness by avoiding the error-prone reimplementation of complex algorithms and replacing them with calls into the library that has been well-tested by a large number of developers. In the ten years since its original publication, BALL has seen a substantial increase in functionality and numerous other improvements. Results Here, we discuss BALL's current functionality and highlight the key additions and improvements: support for additional file formats, molecular edit-functionality, new molecular mechanics force fields, novel energy minimization techniques, docking algorithms, and support for cheminformatics. Conclusions BALL is available for all major operating systems, including Linux, Windows, and MacOS X. It is available free of charge under the Lesser GNU Public License (LPGL). Parts of the code are distributed under the GNU Public License (GPL). BALL is available as source code and binary packages from the project web site at <url>http://www.ball-project.org</url>. Recently, it has been accepted into the debian project; integration into further distributions is currently pursued.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central